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I. INTRODUCTION 

The problem of finding variable configurations that 
minimize the energy of a system with competitive inter- 
actions has been and still is a central one in the study of 
complex systems, like spin glasses in physics, protein fold- 
ing and regulatory networks in biology, and optimization 
problems in computer science (see e.g., 

Among the tools for numerical investigations of com- 
plex systems at low temperatures the simulated anneal- 
ing (SA) algorithm ^ and its variants have played a ma- 
jor role. Such stochastic processes satisfy detailed bal- 
ance and their behavior can be compared with static and 
dynamical mean-field calculations. However, in prob- 
lems in which the interest is focused on zero temperature 
ground states and where the proliferation of metastable 
states causes an exponential slowdown in the equilibra- 
tion rate, the applicability of SA-like algorithms is limited 
to relatively small system sizes. 

In computer science the field of combinatorial opti- 
mization |3l deals precisely with the general issue of clas- 
sifying the computational difficulty ( "hardness" ) of min- 
imization problems and of designing search algorithms. 
Similarly to statistical physics models, a generic combina- 
torial optimization problem is composed of many discrete 
variables — e.g.. Boolean variables, finite sets of colors or 
Ising spins — which interact through constraints typically 
involving a small number of variables, that in turn sum 
up to give the global cost-energy function. 

When the problem instances are extracted at random 
from nontrivial ensembles (that is ensembles which con- 
tains many instances that are hard to solve), computer 
science meets physics in a very direct way: many of the 
models considered to be of basic interest for Computer 
Science are nothing but spin glasses defined over finite 
connectivity random graphs, the well studied diluted spin 
glasses 0, Their associated energy function counts 
the number of violated constraints in the original coni- 



"IbattagliOsissa.itj 
fkoIarmiOsissa.itf 
^ ^zecch ina@ictp . t rieste . it] 



binatorial problem (with ground states corresponding to 
optimal solutions). Understanding the onset of hardness 
of such systems is at the same time central to computer 
science and to T = statistical physics with surprisingly 
concrete engineering applications. For instance, among 
the most effective error correcting codes and data com- 
pression methods are the Low Density Parity Check al- 
gorithms [l3| , which indeed implement an energy 
minimization of a spin glass energy defined over a sparse 
random graph. In such problems, the choice of the graph 
ensemble is a part of the designing techniques, a fact that 
makes spin glass theory directly applicable. 

The above example is however far from representing 
the general scenario for combinatorial problems: in many 
situations the probabilistic set up is not defined and, con- 
sequently, the notion of typical-case analysis does not 
play any obvious role. The study of the connection (if 
any) between worst-case and typical-case complexity is 
indeed an open one and very few general results are 
known p3 | . Still, a precise understanding of non-trivial 
random problem instances promises to be important un- 
der many aspects. New algorithmic results as well as 
many mathematical issues have been put forward by the 
statistical physics studies, with examples ranging from 
phase transitions '15', Tdj and out-of-equilibrium analysis 
of randomized algorithms |l7j to new classes of message- 
passing algorithms lSJ20l|. 

The physical scenario for the diluted spin glasses ver- 
sion of hard combinatorial problems predicts a trapping 
in metastable states for exponentially long times of local 
search dynamic process satisfying detailed balance. De- 
pending on the models and on the details of the process — 
e.g., cooling rate for SA — the long time dynamics is dom- 
inated by different types of metastable states at differ- 
ent temperatures |2l|. A common feature is that at 
zero temperature and for simulation times which are sub- 
exponential in the size of the problem there exists an ex- 
tensive gap in energy which separates the blocking states 
from true ground states. 

Such behavior can be tested on concrete random 
instances which therefore constitute a computational 
benchmark for more general algorithms. Of particular 
interest for computer science are randomized search pro- 
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cesses which do not properly satisfy detailed balance and 
that are known (numerically) to be more efficient than 
SA-like algorithms in the search for ground states [2^ . 
Whether the physical blocking scenario applies also to 
these artificial processes, which are not necessarily char- 
acterized by a proper Boltzmann distribution at long 
times, is a difficult open problem. The available numeri- 
cal results and some approximate analytical calculations 
I23L 01 seem to support the existence of a thermody- 
namical gap, a fact which is of up-most importance for 
optimization. For this reason (and independently from 
physics), during the last decade the problem of finding 
minimal energy configurations of random combinatorial 
problems similar to diluted spin-glasses — e.g., random K- 
Satisfiability (K-SAT) or Graph Coloring — has become a 
very popular algorithmic benchmark in computer science 

i 

In the last few years there has been a great progress in 
the study of spin glasses over random graphs which has 
shed new light on mean-field theory and has produced 
new algorithmic tools for the study of low energy states in 
large single problem instances. Quite surprisingly, prob- 
lems which were considered to be algorithmically hard for 
local search algorithms, like for instance random K-SAT 
close to a phase boundary, turned out to be efficiently 
solved by the Survey Propagation (SP) algorithm arising 
from the replica symmetry broken (RSB) cavity approach 
to diluted spin glasses. Such type of results calls for a 
rigorous theory of the functioning of SP (which is a non 
local process) and bring new mathematical challenges of 
potential practical impact. 

Scope of this paper is to display a set of new numerical 
and algorithmic results which complete previously pub- 
lished results on the SP algorithm. We shall deal only 
with the random K-SAT problem even though we ex- 
pect the algorithmic outcomes to be applicable to other 
similar problems like, for instance, the random graph col- 
oring. 

The paper is organized as follows. In Sections m IIIII 
we briefly review the known results on random K-SAT 
together with the SP equations over single instances at 
finite pseudo-temperature. We discuss as well in II VI how 
the SP algorithm can be modified in order to study the 
region of parameters with finite ground state energy (UN- 
SAT phase), where not all constraints of the underlying 
random K-SAT problem can be satisfied simultaneously. 
In Sec. 0we discuss then the performance of SP as an 
optimization device. At variance with the SAT phase in 
which many clusters of zero energy configurations coex- 
ist and where SP works efficiently without need of cor- 
recting variable assignments, in the UNSAT phase an 
efficient implementation of SP requires the introduction 
of — at least — a very simple form of backtracking proce- 
dure (similar to the one proposed in [25[). We show that a 
linear time backtrack is enough to reach energies compat- 
ible with those predicted by the analytic calculations in 
the infinite size limit in the relevant region of parameters. 
We give moreover numerical evidence for the existence 



of threshold states for one of the most efficient random- 
ized local search algorithms for solving random K-SAT, 
namely WalkSat [i^. We display a blocking mechanism 
at an energy level which is definitely above the lower 
bound for the dynamical threshold states predicted by 
the stability analysis of the 1-RSB cavity equations. Fi- 
nally, for the deep UNSAT phase, we report on numeri- 
cal data on convergence times for both WalkSat and SA 
which are in agreement with the predicted existence of 
full RSB (f-RSB) phases. Conclusions and perspectives 
are briefly discussed in Sec. lVIl 



II. BRIEF REVIEW OF RANDOM K-SAT 

K-SAT is a NP-complete problem ^ (for K > 2) 
which lies at the root of combinatorial optimization. It 
is very easy to state: Given N Boolean variables and M 
constraints taking the form of clauses, K-SAT consists 
in asking whether it exists an assignment of the variables 
that satisfies all constraints. Each clause contains exactly 
K variables, either directed or negated, and its truth 
value is given by the OR function. Since the same vari- 
able may appear directed or negated in different clauses, 
competitive interactions among clauses may set in. 

As mentioned in the introduction, in the last decade 
there has been a lot of interest on the random version 
of K-SAT: for each clause the variables are chosen uni- 
formly at random (with no repetitions) and negated with 
probability 1/2. 

In the large N limit, random K-SAT displays a very 
interesting threshold phenomenon. Taking as control pa- 
rameter the ratio of number of clauses to number of vari- 
ables, a = M/N, there exists a phase transition at a fi- 
nite value ac{K) of this ratio. For a < ac{K) the generic 
problem is satisfiable (SAT), for a > ac{K) the generic 
problem is not satisfiable (UNSAT). 

This phase transition has been seen numerically 
and it is of special interest since extensive experiments 
[3] have shown that the instances which are algorithmi- 
cally hard to solve are exactly those where a is close to 
Uc- Therefore, the study of the SAT/UNSAT phase tran- 
sition is considered of crucial relevance for understand- 
ing the onset of computational complexity in typical in- 
stances p. A lot of work has been focused on the study 
of both the decision problem (i.e., determining with a 
YES/NO answer whether a satisfying assignment exists), 
and the optimization version in which one is interested 
in minimizing the number of violated clauses when the 
problem is UNSAT (random MAX-K-SAT problem). 

On the analytical side, there exists a proof that the 
threshold phenomenon exists at large N although 
the fact that the corresponding ac has a limit when 
N ^ 00 has not yet been established rigorously. Upper 
bounds ajm(K) on ac have been found using ffist mo- 
ment methods |29| and variational interpolation methods 
[SOf . and lower bounds ai,B{K) have been found using 
either explicit analysis of some algorithms [sij , or some 
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second moment methods '325. For random MAX-K-SAT 
theoretical bounds are also known (3^ i37,] , as well as rig- 
orous results on the running times of random walk and 
approximation algorithms 134 . Issl l36| . 

Recently, the cavity method of statistical physics has 
been applied to K-SAT [H HI HI] and the thresholds 
have been computed with high accuracy. A lot of work is 
going on in order to provide a rigorous foundation to the 
cavity results and we refer to |38l | for a more complete 
discussion of these aspects. 

In what follows we shall concentrate on the K = 3 
case and we will be interested in analyzing the behavior 
of different algorithms in the region of parameter in which 
the random formulas are expected to be hard to solve or 
to minimize. The energy function which is used in the 
zero temperature statistical mechanics studies is taken 
proportional to the number of violated clauses in a given 
problem so that a zero energy ground state corresponds 
to a satisfying assignment. The energy of a single clause 
is positive (equals 2 for technical reasons) if the clause is 
violated and zero if it is satisfied. The overall energy is 
obtained by summing over clauses and reads 
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nil (1 + ^a,.<) 



(1) 



where sf is the i-th binary (spin) variable appearing in 
clause a and the coupling Ja.i takes the value 1 (resp. -1) 
if the corresponding variable appears not negated (resp. 
negated) in clause a. For instance the clause (xi V2;2Va;3) 
has an energy i(l-|-si)(l — S2)(l + S3) where the Boolean 
variables Xi = {0, 1} are connected to the spin variables 
by the transformation Si — (—1)^'. 

The phase diagram of the random 3-SAT problem as 
arising from the statistical physics studies can be very 
briefly summarized as follows. 

For a < 3.86, the T = phase is at zero energy (the 
problem is SAT). The entropy density is finite and the 
phase is Replica Symmetric (RS) and unfrozen. Roughly 
speaking, this means that there exists one giant cluster 
of nearby solutions and that the effective fields vanish 
linearly with the temperature. 

For 3.86 < a < 3.92, there is a fuU RSB phase. The 
solution space breaks in clusters and the order parame- 
ter becomes a nested probability measure in the space of 
probability distribution describing cluster to cluster fluc- 
tuations. The phase is still SAT and unfrozen "s?, "4^. 

At a ~ 3.92 there is a discontinuous transition toward 
a clustered frozen phase [l^ IT^ . Up to a = 4.15 the 
phase is f-RSB while above the 1-RSB solution becomes 
stable j41j|. The complexity, that is the normalized loga- 
rithm of the number of clusters, is finite in this region. 
At finite energy there exist even more metastable states 
which act as dynamical traps. The 1-RSB metastable 
states become unstable at some energy density Ecia) 
which constitutes a lower bound to the true dynamical 
threshold energy (see Sec. Illll for more details). 

At a = 4.2667 the ground state energy becomes pos- 
itive and therefore the typical random 3-SAT problem 





LU 



0.0025 



0.002 



0.0015 



0.001 



0.0005 



LU 



0.0007 
0.0006 
0.0005 
0.0004 
0.0003 
0.0002 
0.0001 








- SAT / 


' IJNSAT ^ - 







4.15 4.25 4.35 



f-RSB SAT 
UNFROZEN 



RS SAT 
PHASE 



1-RSB 
SAT 
FROZEN 



f-RSB SAT 
FROZEN 



1-RSB 
UNSAT 



f-RSB 
UNSAT 



3.7 3.8 3.9 4 4.1 4.2 4.3 
Clauses to variables ratio 



4.4 4.5 



FIG. 1: The solid line is an extimation for the ground state 
energy, while the dashed curve represents the Gardner energy, 
providing a lower bound for the threshold states (numerical 
data adapted from ref. 4T'|). In the inset we show that the 
difference between the Gardner and the ground state energy 
is strictly positive in the small 1-RSB stable region around the 
SAT/UNSAT transition critical point (indicated by the ver- 
tical line): it is expected that it is hard for heuristics based 
on local search to find assignments inside the closed area de- 
limited by the energy gap curve. 



becomes UNSAT. At the same point the complexity van- 
ishes. The phase remains 1-RSB up to a = 4.39 where 
an instability toward a zero complexity full RSB phase 
appears. 

In the region 4.15 < a < 4.39, the 1-RSB ansatz for 
the ground state is stable against higher orders of RSB, 
but the 1-RSB predictions become unstable for energies 
larger than the Gardner energy. The instability line in- 
tersects with the 1-RSB ground state extimation at the 
two extremes of the interval, inside which it provides a 
lower bound to the true threshold energy (see Ref. 
for a comprehensive discussion). 

Further (preliminary) f-RSB corrections suggest that 
the true threshold states have energies very close to the 
lower bound and hence the interval A = [4.15,4.39] 
should be taken as the region where to take really hard 
benchmarks for algorithm testing. As displayed in Fig.^ 
the actual value of the energy gap is very small close to 
the end points of A. In order to avoid systematic finite 
size errors, numerical simulations should be done close to 
the SAT/UNSAT point, i.e., far from the end point of A. 
Consistently with the fact that finite size fluctuations are 
relatively big (0(\/7V)), even close to ac problem sizes of 
the order at least of iV = 10^ are necessary in order to 
observe a matching with the analytic predictions. 



III. BRIEF REVIEW OF SP EQUATIONS 

The 1-RSB cavity equations which have been used to 
study the typical phase diagram of random K-SAT be- 
come the SP equations once reformulated to run over 
single problem instance jl8j |. This is done by avoiding 
the averaging process with respect to the underlying ran- 
dom graphs. Thanks to the self-averaging property of the 
random K-SAT free energy p.? , the SP equations can be 
used both to re-derive the phase diagram of the problem 
and, more important, to access detailed information of al- 
gorithmic relevance about a given problem instance. In 
particular, the SP equations provide information about 
the statistical behavior of the single variables in the sta- 
ble and metastable states of given energy density. 

The 1-RSB cavity equations are iterative equations 
(averaged over the disorder) for the probability distribu- 
tion functions (pdf) of effective fields that describe their 
cluster-to-cluster fluctuations. The order parameter is 
a probability measure in the space of pdf's; it tells the 
probability that a randomly chosen variable has a certain 
associated pdf in states at a given energy density. 

In SP and more in general in the cavity approach, one 
assumes to know pdf's of the fields of all variables in the 
temporary absence of one of them. Then one writes the 
induced pdf of the local field acting on this "cavity" vari- 
able in absence of some other variable interacting with 
it {i.e., the so called Bethe lattice approximation for the 
problem). These relations define a closed set of equations 
for the pdf's that can be solved iteratively. The equa- 
tions are exact if the cavity variables acting as inputs are 
uncorrelated, e.g., over trees, or are conjectured to be 
an asymptotically exact approximation over locally tree- 
like structures [Tg where the typical distance between 
randomly chosen variables diverges in the large N limit 
(as In for diluted random graphs). The full list of the 
cavity fields over the entire underlying graph, in the SP 
implementation, constitutes the order parameter. From 
the cavity fields one may determine the total field acting 
on each variable in all metastable states of given energy 
density and this information can be used for algorithmic 
purposes. 

A clear formalism for the single sample analysis is given 
by the factor graph representation [13 of K-SAT: vari- 
ables are represented by N circular "variable nodes" la- 
beled with letters i,j,k,... whereas the K-body inter- 
actions are represented by M square "function nodes" 
(carrying the clause energies) labeled by a,b,c, . . . (see 
Fig.ij) 

For random 3-SAT, function nodes have connectivity 
3, variable nodes have a Poisson connectivity of average 
3a and the overall graph is bipartite. The total energy 
is nothing but the sum of energies of all function nodes 
as given by Eq. (Q). 

Adopting the message-passing notation and strictly 
following 18], we call u-messages the contribution to 
the cavity fields coming from the different connected 
branches of the graph. In SP the messages along the 
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FIG. 2: Factor graph representation. Variables are repre- 
sented by circles, and are connected by function nodes, rep- 
resented by squares; if a variable appears negated in a clause, 
the connecting line is dashed. 



links of the factor graph have a functional nature car- 
rying information about distributions of w-messages over 
the states at a given value of the energy, fixed by a La- 
grange multiplier y: we call these distributions of mes- 
sages u-surveys. The SP equations can be written at any 
"temperature" (the inverse of the Lagrange multiplier y 
is actually a pseudo-temperature, see l^]). However they 
acquire a particularly simple form in the limit l/y — > 0, 
which is the limit of interest for optimization purposes, 
at least in the SAT region. 

In K-SAT, the u-surveys are parameterized by two real 
numbers and SP can be implemented very efficiently. 
Each edge a — s- i, from a function node a to a vari- 
able node i, carries a u-survey Qa-^i(u). From these u- 
surveys one can compute the cavity fields hi^i, for every 
b S V{i), which in turn determine new output u-surveys 
(see Fig.OJ. 

Very schematically, the SP equations can be imple- 
mented as follows. Let V{i) be the set of function nodes 
connected to the variable i, V{a) the set of variables con- 
nected to the function node a; let us denote by V{i) \ a 
and V{a) \i the same sets deprived respectively of the 
clause a and of the variable i. Given then a random 
initialization of all the u-surveys Qa^i(u), the function 
nodes are selected sequentially at random and the u- 
surveys are updated according to a complete set of cou- 
pled functional equations (see Fig. 13 for the notation): 




xexp(y(|^ Ub^j\- J2 (2) 

&GV(i)\a beVU)\a 
Qa^,{u) = JvPa,,S{u-Ua^,{{hj^a})), (3) 

where the Ci-^a's are normalization constants, the func- 
tion Ua^i is: 

Ua-,i{{hj^a}) = Ja,t Y\. ^(Jajhj^a), (4) 
3£V{a)\i 
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FIG. 3: Cavity fields and M-messages. The M-survey for the u- 
message Ma-.i depends on the pdf's of the cavity fields hjj^-,a 



and hi. 



These are on the other side dependent on the 



ii-surveys for the u-messages incoming to the variables ji and 

j2- 



and the integration measures are given by: 

beVU)\a. 



(5) 



where = 1 — — 7?^^^, the above set of equations 
(|2I3|) defines a non-Hnear map over the 77's. 

Once a fixed point is reached, from the Ust of the u- 
surveys one may compute the normahzed pdf of the local 
field acting on each variable: 



X exp 

bev{i) 



Pi{H) = C, [vQ,5(h- "fc- 

bev{i) 

bev{i) 

T^Qi = W Qb^i{ub^i) duh^i. (9) 
bev{i) 



It should be remarked that Pi{H) is in general different 
from the family of cavity fields pdf's Pi^b{h) computed 
by mean of |j2Jl. 

From the knowledge of the cavity and local fields pdf's, 
one derives the (Bethe) free energy at the level of 1-RSB: 



j£Via)\z 



(6) 



1 

N 



M 



N 



Parameterizing the u-surveys as 
Qa^i{u) — rf^^^5{u) + r]'^^-S{u — 1) + rj^^^5{u + 1) (7) where Ti is the connectivity of the variable i and: 
I 



^ y-' iev(a) 



-y mm 

{cj,A£V(a)} 





E '^b^i 


\ ieV{a) 


beV{i)\a 



(7i+ E l^b- 
fceV(i)\a 



= --\n{ I VQ, exp 



2/(1 E ""^^l " E l""^- 

aeV(i) a6V(i) 



-ln(C,). 

y 



ill) 



Here, Ea is the energy contribution of the function node 
a. The maximum value of the free-energy functional pro- 
vides a lower bound estimation of the ground state en- 
ergy of the Hamiltonian Q defined on the sample. In 
the SAT region the free-energy functional is always 
non positive and it is increasing in the limit y ^ 00; in 
the UNSAT region, on the contrary, it exhibits a positive 
maximum for y = y* (see [T3|'). 

From the free-energy density of a given instance, it 
is straightforward to compute numerically its complex- 
ity E(y) = d^{y)/d{l/y) and its energy density e{y) — 
d{y^{y))/dy. We remind that the complexity is linked 
to the number of pure states (i.e., clusters of configu- 



rations) of energy E, by the defining relation M{E) = 
exp {NTj{E)). The energy level represented by the largest 
number of configurations, cth, is given by: 

E(et,,) = max(S(S)). (12) 

E 

Further RSB corrections may be needed to locate the pre- 
cise value of eth , which is in any case lower bounded the 
largest energy of 1-RSB stable states, the so called Gard- 
ner energy Eg ■ It is expected that local search strategies 
get trapped at energies close, but not necessarily equal, to 
the threshold energy (see refs. [Ul for a throrough discus- 
sion on the role of the iso-complexity states 42]). More 
elaborated strategies not properly satisfying detailed bal- 
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ance {e.g., WalkSat for the K-SAT problem) could in 
principle overcome this type of barriers; however, the 
available numerical and analytical results suggest that 
also these more sophisticated randomized searches un- 
dergo an exponential slowdown, with different layers of 
states acting as dynamical traps, depending on the de- 
tails of the heuristics. 




IV. SP IN THE UNSAT REGION 



(a) 



(b) 



(c) 



In the SAT phase, where the y — > oo limit is taken, the 
convolutions |5J) filter out completely any clause- violating 
truth value assignment. This feature is extremely useful 
for satisfiable formulas, but it becomes undesired when 
our sample is presumably unsatisfiable. 

In the UNSAT region the SP equations require a finite 
value of the Lagrange multiplier y. The filtering action of 
the exponential re- weighting term in (|2Jl is then weakened 
and the messages computed by the SP equations can ve- 
hicle information pointing to states with a non vanishing 
number of violated constraints. 



A. The finite pseudo-temperature recursive 
equations 

The SP equations simplify considerably in the y ^ oo 
limit and lead to extremely efficient algorithmic imple- 
mentations, as discussed in great detail in 20]. In the 
case of finite pseudo-temperature 1/y the same simpli- 
fication cannot take place because of the presence of a 
nontrivial re-weighting factor. Still, a relatively fast re- 
cursive procedure can be written. Let us consider a vari- 
able j having Tj neighboring function nodes and let us 
compute the cavity field pdf Pj^a{h) where a G V{i). 
We start by randomly picking up one function node in 
V{j) \ a, denoted as 61, and we calculate the following 
"/i-survey" : 



:S{h)+rj+^.^5{h-l) + Tj;^^^S{h + l). 

(13) 

The function Pj^^{h) would correspond to the true local 
field pdf of the variable j in the case in which bi was the 
only neighboring clause (as denoted by the upper index). 

The following steps of the recursive procedure consist 
in adding the contributions of all the other function nodes 
in V{j) \ a, clause by clause (Fig.^: 



Vb. 



(7-1) 



(14) 



Pla'ih-l) cxp -2y9i-h) 



71b.,^,P\la\h + l) exp -2ye{h) 



Here Pjj^J/i) is an unnornialized pdf and 9{h) is a step 
function equal to 1 for h > Q and zero otherwise. The 
recursion ends after 7 = Fj — 1 steps, when the influence 



FIG. 4: Computing recursively a cavity pdf. (a) In order to 
find a single cavity pdf Pj^aih), a single clause 61 in V{j) \a 
is picked up at random and the it-survey Qb-^~tj is used to 
compute equation Hl.'i|l : (b) The contributions of all the other 
function nodes in V{j)\a are then added, clause by clause; (c) 
The pdf computed recursively after Vj — 1 iterations coincides 

with Pj^a(h). 



of every clause in V{j) \ a has been taken in account. 
The final cavity-field pdf Pj^a{h) can be found straight- 

— (T - — 1) 

forwardly by computing the pdf Pj^a W ^'^^ values 
of the field — Fj + 1 < < Fj — 1 and by normalizing it. 

As already pointed out in Section ITTll the knowledge 
of A' — 1 input cavity-field pdf 's can be used to obtain 
a single output w-survey. Let us compute for instance 
the M-survey Qa^i{u) (see always Fig. |21 for the nota- 
tion). In order to do that, we need first the cavity field 
pdf's Pj^a{h) for every j £ V{a) \ i. The parameters 
{Va^i: Va^i: Va-^i} ^^c then updated according to the for- 
mulas: 



Ja 

Va- 



0, ri^^^l-ri^., 



n=l 



where we introduced the weight factors: 



(15) 



r.,-1 -1 

^r^a - E Pj^-W^ ^i^a - E P^^-ih). 

h=i h=-rj+i 

(16) 

It should be remarked that Qa—>i{u) depends only on 
one single nontrivial rj^'^^ (from now simply referred to 
as rja^i ). We could say that a single kind of message can 
be produced, telling the receiver literal to assume the 
truth value "TRUE"; this message is transmitted along 
the edge a ^ i with a probability rja^i, corresponding 
to the probability that the only way of not violating the 
constraint a is to set appropriately the truth value of i. 

Starting from a full collection of u-surveys at a given 
time, it is possible to realize a complete update of all the 
parameters {?7a->i} by systematical application of the re- 
cursions (|13|l . 1)14(1 and of the relation H15|l : from the new 
set of M-surveys, new cavity field pdf's can be computed 
and the procedure continues until when self-consistence 
of ry's is reached. This procedure can be efficiently imple- 
mented numerically and allows us to determine the fixed 
point of the population-dynamics equations l(2Jl, iOj for 
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a general value of y. 



B. The SP-Y algorithm 

In the usual SP-inspired decimation pol |. the compu- 
tation of the local field pdf's Pi{H) is used to decide a 
truth value assignment for the most biased variables. In- 
deed, it is reasonable that a spin tends to align itself with 
the most probable direction of the local field. A ranking 
can be realized by finding all the probability weights 



I^+=^P,(iJ), Wr= P^[H), (17) 



H=l 



and by sorting the variables according to the values of a 
bias function: 



6fix(j) = \W+ - Wr 



(18) 



The local field pdf's Pj{H) can be naturally calculated 
resorting to the iterations (|13|) . (|14|l : computation is done 
simply by sweeping over the whole set of neighboring 
function nodes V{j)^ including also the contribution of 
the skipped edge a j. By fixing in the right direction 
the spin of the most biased variable, we actually reduce 
the original N variable problem to a new one with — 1 
variables. New it-surveys are then computed. Doing that 
we have to take care of fixed variables: if i is fixed, its 
cavity field pdf's must be of the form: 



Pi-,a{h) = 6{h- Ja,iSi) 



(19) 



regardless of the recursions (|14l) . The complete po- 
larization refiects the knowledge of the truth value of the 
literals depending on the spin Si. 

The procedure of decimation continues until when a 
full truth assignment has been generated or until when 
convergence has been lost or a paramagnetic state has 
been reached; in the latter cases the original formula is 
simplified according to the partial truth assignment al- 
ready generated and the simplified formula is passed to 
a specialized heuristic. Our choice of preference is the 
WalkSat algorithm 0|, which is by far more efficient 
than SA in the hard region of the 3-SAT problem, as 
we have checked exhaustively. Very briefly, the strat- 
egy of WalkSat is the following one: at each time step 
the current assignment is changed by randomly alternat- 
ing greedy moves (where the variable which maximizes 
the number of satisfied clauses if fixed) and random-walk 
steps (in which a variable belonging to a randomly cho- 
sen unsatisfied clause is selected and fiipped). WalkSat 
stops if either a satisfying assignment is found or if the 
maximum number of allowed spin flips (the "cutoff" ) is 
reached (see Ref. another recently analyzed and 

very efficient heuristics). 

When working at finite pseudo-temperature, we have 
to take in account the possibility that some non optimal 



fixing is done in presence of thermal "noise" . After sev- 
eral updates of the u-surveys some biases of fixed spins 
may become smaller than the value they had at the time 
when the corresponding spin was fixed. Certain local 
fields can even revert their orientation. Small or positive 
values of an index function like: 



^backtrack 



(j) 



(20) 



can track the appearance of such dangerous fixed spins 
and this information can be used to implement some "er- 
ror removal" procedure; for instance, a simple strategy 
can be devised where both unfixing and fixin g m oves are 
performed at a fixed ratio < r < 0.5 (see [23 for an- 
other backtracking implementation). 

The actual SP with finite y simplification procedure 
(SP-Y) will depend not only on the backtracking fraction 
r, but even more on the choice of the inverse pseudo- 
temperature y. The simplest possibility is to keep it 
fixed during the simplification, but one may choose to 
dynamically update it, in order to stay as close as pos- 
sible to the maximum y* of the free energy functional 
$(?/) (which corresponds to select the ground state in 
the 1-RSB framework, as we have seen in Section Hill . 

The equations (|10|) . Ijlll) can be rewritten in the fol- 
lowing form, suitable for numerical computation: 



ln(l+(e-^-l) n ^'^^ 

ieV{a) 

^ln( [] C^a)], 
ieV[a) 



$ny) = --ln (Q), 



(21) 
(22) 



In Fig. 13 we give a summary of the simplification pro- 
cedure in a standard pseudo-code notation. The first 
release of the SP-Y code can be downloaded from lial. 



V. OPTIMIZING THE ENERGY BELOW THE 
THRESHOLD STATES 

As we have already discussed in Section ITm it is ex- 
pected that, in the thermodynamical limit, any local 
search algorithm gets trapped in the vicinity of exponen- 
tially numerous threshold states with energy eth and that 
any local heuristics is in general unable to find the opti- 
mal assignment in the thermodynamical limit. To verify 
this prediction, we conducted various experiments, both 
in the SAT and in the UNSAT phase, focusing on the 
comparison between the WalkSat heuristics performance 
after and before different kinds of SP-Y simplification. 
In most of the situations, we decided to analyze carefully 
single large-sized samples instead of a larger number of 
smaller problems: we verified in fact that the sample- 
to-sample fluctuations tend to be unrelevant for size of 
order 10** and larger. 
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INPUT: a Boolean formula T in conjunctive 
normal form; a backtracking ratio 
r; optionally, a fixed inverse 
pseudo-temperature yin 

OUTPUT: a simplified Boolean formula T' in 

conjunctive normal form (ideally empty) and 
a partial truth value assignment for the 
variables of T (ideally a complete one) 

0. For each edge a — > i of the factor graph, 
randomly initialize the r]a^i G {0, 1} 

1. IF there is a fixed as input, put 

y* — yin, ELSE after a fixed number of 
steps, determine by bisection the position 
of the free-energy maximum y* 

2. Compute all the fixed point it-surveys, 
using equations ( 1131 ) , ( 1141 ) , ( 1151 ) and putting 

y = y' 

3. IF the population dynamics equations 
converge , 

3.1 FOR every unfixed variable i, compute 
the local field pdf using ( [Tst , ( IT4t 

3.2 Extract a random number q in [0,1] 

3.3 IF g > r, Sort the variables according 
to the index function 1181 ) , and fix the 
most biased variable 

3.4 ELSE IF g < r. Sort the variables 
according to the index function ( 1201 ) 
and unfix the highest ranked variable 

3.5 IF all the variables are fixed, RETURN 
the full truth value assignment and an 
empty sub-formula, ELSE, go to 1. 

4. ELSE IF the population dynamics equations 
do not converge, simplify the formula by 
imposing the already assigned truth values, 
RETURN the partial solution and the obtained 
sub-formula 



FIG. 5: The SP-Y simplification algorithm. 



A. SAT region 

The aim of the first set of experiments was to check the 
actual existence of the threshold effect. We ran Walk- 
Sat over different formulas in the hard-SAT region, with 
fixed a = 4.24 and sizes varying between N — 10'^ and 
N = 10^, reaching a maximum cutoff of 10^° spin flips. 
The obtained results are plotted in Fig. |5| the Gardner 
energy is also reported for comparison with the data. 
Even if for small-size samples the local search algorithm 
is able to find a SAT assignment, for larger formulas 
{N ~ O{10'^)) WalkSat does not succeed in reaching the 
ground state, its relaxation profile suffers of critical slow- 
down, and saturates at some well defined level. This is 
actually expected, because the Gardner energy becomes 



N = 1x10 before decimation ' — ■ — ' 

^j,0 _ N = 5x10 before decimation f; 

^ N = 1x10„ before decimation 

N = 5x10 before decimation o 

o N = 1x10 before decimation ^--t-- 

9 ^ Gardner energy 

10"' : , ■ 




Cutoff 



FIG. 6; Threshold energy effect in SAT region. The WalkSat 
performance for various samples of different sizes and a — 
4.24 is presented. With increasing size, the curves appear to 
saturate above the Gardner energy. An arrow indicates that 
the next data point corresponds to a SAT assignment. 



0{1) only for N ^ 10^ or larger, and for a smaller num- 
ber of variables the threshold effect should be negligible 
when compared to finite size effects. 

We remind that WalkSat cannot be considered as an 
equilibrium stochastic process and that it is not possible 
to infer that its saturation level coincides with the sam- 
ple threshold energy; we can anyway claim that Walk- 
Sat is unable to explore the full energy landscape of the 
problem, and that the enormous number of non opti- 
mal valleys is unavoidably hiding the true ground states. 
Plateaus in the relaxation profiles of WalkSat have in- 
deed been already discussed in |23| and ascribed to 
metastable states acting as dynamical traps. 

For the = 10"* formula a trapping effect becomes 
clearly visible in our experiments, but the saturation 
plateau is below the Gardner lower bound. The finite- 
size fluctuations are still of the same order of the energy 
gap between the ground and the threshold states and 
the experimental conditions are distant from the ther- 
modynamical limit. When the size is increased up to 10^ 
variables, the saturation level moves finally between the 
full RSB lower bound and the I-RSB upper bound for 
eth- 

The efficiency of the SP-Y simplification strategy 
against the glass threshold is discussed in Fig. [7| We 
simplified a single randomly generated formula (N = 10^, 
a — 4.24) at several fixed values of pseudo-temperature. 
The solid line shows for comparison the WalkSat results 
after a standard SP decimation (i.e., y ~* oo): the 
ground state, E — 0, is reached as expected, after a 
rather small number of spin flips. The same happens 
after SP-Y simpliflcations performed at a large enough 
inverse pseudo-temperature (y > 4); one should remind 
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Before decimation i — o — 
SP-Y = 1 .5 decimation ■ 
SP-Y = 1.5 (backtracking) ^-o- 
SP-Y = 3.5 decimation 
SP-Y = 4.5 decimation ^ ^ 

SP decimation 

Gardner energy 




10 10 

Cutoff 



10"" 



Before decimation 
SP-Y = 2.5 
SP-Y = 35 
SP-Y = 2.5, backtracking r = 0.2 
SP-Y updating, backtracking r = 0.2 
Gardner energy 
' Ground state lower bound 



10' 
Cutoff 



FIG. 7: Efficiency of SP-Y in the SAT region (single sample 
with N = 10^ variables and a = 4.24). After SP-Y simplifi- 
cation, WalkSat is generally able to find solutions below the 
Gardner threshold; in some cases, it succeeds even in finding 
complete satisfying assignment. An arrow indicates that the 
next data point corresponds to a SAT assignment. 



FIG. 8: SP-Y performance in the UNSAT region (single sam- 
ple with TV = 10^ variables and a = 4.29). Several simplifi- 
cation strategies are compared; the need for backtracking is 
readily visible, and its introduction allows to reach energies 
closer to the ground state than to the Gardner lower bound. 



B. UNSAT region 



indeed that in the SAT region the optimal value for y 
would be infinite, and that in that limit the SP-Y re- 
cursions reduce to the SP equations. After simplification 
with smaller y's, the WalkSat cooling curves reach again 
a saturation level, which is nevertheless below the Gard- 
ner energy, unless y is too small: the threshold states of 
the original formula have not been able to trap the local 
search, even if the ground state becomes inaccessible. As 
we have indeed already discussed, working at finite tem- 
perature increases the probability of violating a clause 
when doing a spin fixing, and this is particularly evident 
in the SAT region where every assignment that does not 
satisfy some constraint should be filtered out. 

The procedure is intrinsically error prone, and it will 
allow in general to reach only "good states" , but not the 
true optimal solutions (the smaller the parameter y, the 
higher the saturation level will be) . As we shall discuss in 
the next section, the use of backtracking partially cures 
the accumulation of errors at finite y: the saturation level 
can in fact be significantly lowered by keeping the same 
pseudo-temperature and introducing a small fraction of 
backtrack moves during the simplification. In Fig. [71 the 
data for y = 1.5 shows the importance of backtracking. 
While the run of SP-Y without backtracking has led to a 
plateau above Gardner energy, with the introduction of 
backtrack moves we find energies well below the thresh- 
old. 



When entering the UNSAT region, the task of look- 
ing for the optimal state becomes harder. The expected 
presence of violated constraints in the optimal assign- 
ments really forces us to run the simplification at a finite 
pseudo-temperature. Unfortunately, after many spin fix- 
ings, the recursions H13|) . 114|) stop to converge for some 
finite value of y before the maximum of the free energy is 
reached, most likely because the sub-problem has entered 
a full RSB phase. At this point one should switch to a 
2-RSB version of SP which we did not realize, yet. Alter- 
natively, one could try to run directly the final heuristic 
search (hoping that the full RSB sub-system is not ex- 
ponentially hard to optimize) or more simply one may 
continue the decimation process by selecting the largest 
y for which the computation converge. We decided to 
implement the latter choice until either convergence is 
lost independently from the value of y or a paramagnetic 
state is reached. 

In our experiments we studied several 3-SAT sample 
problems belonging to the 1-RSB stable UNSAT phase. 
We employed WalkSat as an example of standard well- 
performing heuristics. Although WalkSat is not opti- 
mized for unsatisfiable problems, in the 1-RSB stable 
UNSAT region it performs still much better than any 
basic implementation of SA. We observed anyway that, 
even after 10^*^ spin flips, the WalkSat best assignments 
were still quite distant from the Gardner energy, for var- 
ious samples of different size and a. In Fig. [S] we show 
the results relative to many different SP-Y simpliflcations 
with various values of y and r for a single sample with 
N = 10^ and a = 4.29. The simplification produced al- 
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FIG. 9: Backtracking efficiency. Many SP-Y simplifications 
of a single sample with A'^ — 10* variables and a — 4.35 
have been performed at fixed but different values of pseudo- 
temperature; the introduction of a small fraction of backtrack- 
ing moves eliminates essentially the need for a time consuming 
optimization of the parameter y. The empty points refer to 
simplifications without backtracking, the full points to sim- 
plifications with a backtracking ratio r = 0.2. A diamond 
indicates that the simplification process stopped because of 
loss of convergence, a circle because of finding a completely 
unbiased paramagnetic state, and the squares indicates that 
the loss of convergence happened at an advanced stage where 
some clause-violating assignments have already been intro- 
duced by SP-Y. 



ways an improvement in the WalkSat performance, but, 
in absence of backtracking, we were unable to go below 
the Gardner lower bound (although we touched it in some 
cases: in Fig.|Hlwe show the data for a simplification at 
fixed y = 2.5; a simplification with runtime optimization 
of y reached the same level). 

The relative inefficiency of these first attempts of sim- 
plification was not due to the threshold effect alone, 
but also to an extreme sensitivity to the choice of y, as 
pointed out by a second set of experiments making use of 
backtracking. We performed first an extensive analysis of 
the simultaneous optimization of y and r, using smaller 
samples in order to produce more experimental points. 
After some trials, the fraction r = 0.2 appeared to be the 
optimal one, at least for our implementation, and in the 
small region under investigation of the K-SAT phase dia- 
gram. The data in Fig.|51refers to a formula with N = 10"* 
variables and a = 4.35. The dashed horizontal line shows 
the WalkSat best energy obtained on the original formula 
after 10^ spin flips. The WalkSat performance was seri- 
ously degraded when simplifying at too small values of y, 
but the introduction of backtracking cured the problem, 
identifying and repairing most of the wrong assignments. 
The WalkSat efficiency became actually almost indepen- 
dent from the choice of pseudo-temperature, whereas in 



absence of error correction a time consuming parameter 
tuning was required for optimization. 

Coming back to the analysis of the sample of Fig.|S| the 
backtracking simplifications allowed us to access states 
definitely below the Gardner lower bound. The combi- 
nation of runtime y-optimization and of error correction 
was even more effective: after a rather small number 
of spin fiips, WalkSat reached a saturation level strik- 
ingly closer to the ground state lower bound than to the 
Gardner energy. A further valuable effect of introduc- 
tion of the backtracking was the increased efficiency of 
the formula simplification itself: in the backtracking ex- 
periments, SP-Y was able to determine a truth value for 
more than 80% of the variables before losing convergence, 
while without backtracking, the algorithm stopped on av- 
erage after only 40% of fixings. 

All the samples analyzed in the previous sections were 
taken from the 1-RSB stable region of the 3-SAT prob- 
lem, where the equations Q, Q are considered to be 
exact. For a > 4.39, the phase becomes full RSB and 
SP loses convergence before the free energy $(y) reaches 
its maximum from the very first step of the decimation 
procedure. While a full RSB version of SP would most 
likely provide very good results, SP-Y still can be used 
in a sub-optimal way by selecting the largest value of y 
for which convergence is reached. Numerical experiment 
show that indeed the performance of SP-Y are in good 
agreement with the analytical expectations. However, it 
should be noticed that in this region the use of SP is not 
necessary. Although the performance of WalkSat and SA 
can be improved by the SP simplification, the SA alone 
is already able of finding close-to-optimum assignments 
efficiently (as expected for a full RSB scenario) and be- 
haves definitely better than WalkSat. 



VI. CONCLUSIONS 

In this paper, we have displayed the performance of SP 
as an optimization device and shown that configurations 
well below the threshold states can be found efficiently. 
Similar results are expected to hold also for random sat- 
isfiable instances very close to the critical point for which 
the combined use of finite pseudo-temperature and back- 
tracking could give access to the SAT optima. 

It would be of some interest to analyze further im- 
provements of the decimation strategies as well as to con- 
sider more structured factor graphs within a variational 
framework, in which some correlations can be put under 
control. 

A possible application of SP-Y-like algorithms can 
be found in information theory: lossy data compression 
based on Low Density Parity Check schemes leads to op- 
timization problems which are indeed very similar to the 
one discussed in this paper. 
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